-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply normalization consistently in VLenBytes #570
base: main
Are you sure you want to change the base?
Conversation
None and 0 are treated like a 0 length string when computing lengths, and the same normalization should be applied to the value passed to PyBytes_AS_STRING. If this is not done, an assertion is hit in the python runtime (when compiled in debug mode).
numcodecs/vlen.pyx
Outdated
@@ -250,7 +250,10 @@ class VLenBytes(Codec): | |||
l = lengths[i] | |||
store_le32(<uint8_t*>data, l) | |||
data += 4 | |||
encv = PyBytes_AS_STRING(values[i]) | |||
b = values[i] | |||
if b is None or b == 0: # treat these as missing value, normalize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like on line 231? It seems like l = 0
in this case, there is nothing to copy, and these operations are wasted. I might be wrong...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't pass a bytes object, you will hit this assert when calling PyBytes_AS_STRING: https://github.com/python/cpython/blob/40fff90ae3d46843bb9d27c6a53ef61c861a3bb4/Include/cpython/bytesobject.h#L21
it would be equivalent to skip the PyBytes_AS_STRING and memcpy entirely if l = 0. Shall I update the PR to do that instead?
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #570 +/- ##
=======================================
Coverage 99.91% 99.91%
=======================================
Files 59 59
Lines 2334 2334
=======================================
Hits 2332 2332
Misses 2 2 |
None and 0 are treated like a 0 length string when computing lengths, and the same normalization should be applied to the value passed to PyBytes_AS_STRING. If this is not done, an assertion is hit in the python runtime (when compiled in debug mode).
TODO: